Semi-Supervised Learning for Quantitative Structure-Activity Modeling

نویسندگان

  • Jurica Levatic
  • Saso Dzeroski
  • Fran Supek
  • Tomislav Smuc
چکیده

In this study, we compare the performance of semi-supervised and supervised machine learning methods applied to various problems of modeling Quantitative Structure Activity Relationship (QSAR) in sets of chemical compounds. Semi-supervised learning utilizes unlabeled data in addition to labeled data with the goal of building better predictive models than can be learned by using labeled data alone. Typically, labeled QSAR datasets contain tens to hundreds of compounds, while unlabeled data are easily accessible via public databases containing thousands of chemical compounds: this makes QSAR modeling an attractive domain for the application of semi-supervised learning. We tested four different semi-supervised learning algorithms on three different datasets and compared them to five commonly used supervised learning algorithms. While adding unlabeled data does help for certain pairings of dataset and method, semi-supervised learning is not clearly superior to supervised learning across the QSAR classification problems addressed by this study.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Improving prediction accuracy of drug activities by utilising unlabelled instances with feature selection

Molecular activities can be predicted by Quantitative Structure Activity Relationship (QSAR). Because of the high cost of experiments, the number of drug molecules with known activity is much less than that of unknown, to predict molecular activities utilising unlabelled instances will be an interesting issue. Here, Semi-Supervised Learning (SSL) is introduced and a SSL method, Co-Training is i...

متن کامل

Multi-Assay-Based Structure-Activity Relationship Models: Improving Structure-Activity Relationship Models by Incorporating Activity Information from Related Targets

Structure-activity relationship (SAR) models are used to inform and to guide the iterative optimization of chemical leads, and they play a fundamental role in modern drug discovery. In this paper, we present a new class of methods for building SAR models, referred to as multi-assay based, that utilize activity information from different targets. These methods first identify a set of targets tha...

متن کامل

Recurrent Ladder Networks

We propose a recurrent extension of the Ladder networks [22] whose structure is motivated by the inference required in hierarchical latent variable models. We demonstrate that the recurrent Ladder is able to handle a wide variety of complex learning tasks that benefit from iterative inference and temporal modeling. The architecture shows close-to-optimal results on temporal modeling of video da...

متن کامل

Using Graphs of Classifiers to Impose Constraints on Semi-supervised Relation Extraction

We propose a general approach to modeling semi-supervised learning constraints on unlabeled data. Both traditional supervised classification tasks and many natural semisupervised learning heuristics can be approximated by specifying the desired outcome of walks through a graph of classifiers. We demonstrate the modeling capability of this approach in the task of relation extraction, and experim...

متن کامل

Generalized mixture models, semi-supervised learning, and unknown class inference

In this paper, we discuss generalized mixture models and related semi-supervised learning methods, and show how they can be used to provide explicit methods for unknown class inference. After a brief description of standard mixture modeling and current model-based semi-supervised learning methods, we provide the generalization and discuss its computational implementation using three-stage expec...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Informatica (Slovenia)

دوره 37  شماره 

صفحات  -

تاریخ انتشار 2013